Skip to content

[quantization][DRAFT] Disk space consumption improvements for full model quantization#495

Draft
stamalakhov wants to merge 2 commits intoSamsung:mainfrom
stamalakhov:quant_full_model_impr_size
Draft

[quantization][DRAFT] Disk space consumption improvements for full model quantization#495
stamalakhov wants to merge 2 commits intoSamsung:mainfrom
stamalakhov:quant_full_model_impr_size

Conversation

@stamalakhov
Copy link
Contributor

@stamalakhov stamalakhov commented Feb 16, 2026

This PR fixes population of static causal_masks`position_embeddings` through the layers to save disk space.

It precomputes static causal_mask/position_embeddings for using in llama/quant_decoder_layer to prevent populating every quantized decoder layer with these statically computed parameters to save disk space.

Using this PR circle model for HuggingFaceTB/SmolLM2-135M-Instruct is just 105MiB (vs 300 Mib of #492)

Draft: #436

TICO-DCO-1.0-Signed-off-by: s.malakhov s.malakhov@partner.samsung.com

This PR quantizes the full `LLama` model and converts it to circle format.

TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>
This PR fixes population of static `causal_masks`\`position_embeddings` through the layers to save disk space.

TICO-DCO-1.0-Signed-off-by: s.malakhov <s.malakhov@partner.samsung.com>
@stamalakhov stamalakhov self-assigned this Feb 16, 2026
@stamalakhov stamalakhov changed the title [quantization][DRAFT] Improvements in disk space for full model quantization [quantization][DRAFT] Disk space consumption improvements for full model quantization Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant